<!DOCTYPE html>
<html lang="" xml:lang="">
  <head>
    <title>Data types</title>
    <meta charset="utf-8" />
    <meta name="author" content="datasciencebox.org" />
    <script src="libs/header-attrs/header-attrs.js"></script>
    <link href="libs/font-awesome/css/all.css" rel="stylesheet" />
    <link href="libs/font-awesome/css/v4-shims.css" rel="stylesheet" />
    <link href="libs/panelset/panelset.css" rel="stylesheet" />
    <script src="libs/panelset/panelset.js"></script>
    <script src="libs/htmlwidgets/htmlwidgets.js"></script>
    <link href="libs/datatables-css/datatables-crosstalk.css" rel="stylesheet" />
    <script src="libs/datatables-binding/datatables.js"></script>
    <script src="libs/jquery/jquery-3.6.0.min.js"></script>
    <link href="libs/dt-core/css/jquery.dataTables.min.css" rel="stylesheet" />
    <link href="libs/dt-core/css/jquery.dataTables.extra.css" rel="stylesheet" />
    <script src="libs/dt-core/js/jquery.dataTables.min.js"></script>
    <link href="libs/crosstalk/css/crosstalk.min.css" rel="stylesheet" />
    <script src="libs/crosstalk/js/crosstalk.min.js"></script>
    <link rel="stylesheet" href="../xaringan-themer.css" type="text/css" />
    <link rel="stylesheet" href="../slides.css" type="text/css" />
  </head>
  <body>
    <textarea id="source">
class: center, middle, inverse, title-slide

.title[
# Data types
]
.subtitle[
## <br><br> Data Science in a Box
]
.author[
### <a href="https://datasciencebox.org/">datasciencebox.org</a>
]

---





layout: true
  
&lt;div class="my-footer"&gt;
&lt;span&gt;
&lt;a href="https://datasciencebox.org" target="_blank"&gt;datasciencebox.org&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt; 

---



class: middle

# Why should you care about data types?

---

## Example: Cat lovers

A survey asked respondents their name and number of cats. The instructions said to enter the number of cats as a numerical value.


```r
cat_lovers &lt;- read_csv("data/cat-lovers.csv")
```


```
## # A tibble: 60 × 3
##   name           number_of_cats handedness
##   &lt;chr&gt;          &lt;chr&gt;          &lt;chr&gt;     
## 1 Bernice Warren 0              left      
## 2 Woodrow Stone  0              left      
## 3 Willie Bass    1              left      
## 4 Tyrone Estrada 3              left      
## 5 Alex Daniels   3              left      
## 6 Jane Bates     2              left      
## # … with 54 more rows
```

---

## Oh why won't you work?!


```r
cat_lovers %&gt;%
  summarise(mean_cats = mean(number_of_cats))
```

```
## Warning in mean.default(number_of_cats): argument is not numeric
## or logical: returning NA
```

```
## # A tibble: 1 × 1
##   mean_cats
##       &lt;dbl&gt;
## 1        NA
```

---


```r
?mean
```

&lt;img src="img/mean-help.png" width="75%" style="display: block; margin: auto;" /&gt;

---

## Oh why won't you still work??!!


```r
cat_lovers %&gt;%
  summarise(mean_cats = mean(number_of_cats, na.rm = TRUE))
```

```
## Warning in mean.default(number_of_cats, na.rm = TRUE): argument
## is not numeric or logical: returning NA
```

```
## # A tibble: 1 × 1
##   mean_cats
##       &lt;dbl&gt;
## 1        NA
```

---

## Take a breath and look at your data

.question[
What is the type of the `number_of_cats` variable?
]


```r
glimpse(cat_lovers)
```

```
## Rows: 60
## Columns: 3
## $ name           &lt;chr&gt; "Bernice Warren", "Woodrow Stone", "Will…
## $ number_of_cats &lt;chr&gt; "0", "0", "1", "3", "3", "2", "1", "1", …
## $ handedness     &lt;chr&gt; "left", "left", "left", "left", "left", …
```

---

## Let's take another look

.small[
<div id="htmlwidget-1b4ff99564eb6e8884a5" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-1b4ff99564eb6e8884a5">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60"],["Bernice Warren","Woodrow Stone","Willie Bass","Tyrone Estrada","Alex Daniels","Jane Bates","Latoya Simpson","Darin Woods","Agnes Cobb","Tabitha Grant","Perry Cross","Wanda Silva","Alicia Sims","Emily Logan","Woodrow Elliott","Brent Copeland","Pedro Carlson","Patsy Luna","Brett Robbins","Oliver George","Calvin Perry","Lora Gutierrez","Charlotte Sparks","Earl Mack","Leslie Wade","Santiago Barker","Jose Bell","Lynda Smith","Bradford Marshall","Irving Miller","Caroline Simpson","Frances Welch","Melba Jenkins","Veronica Morales","Juanita Cunningham","Maurice Howard","Teri Pierce","Phil Franklin","Jan Zimmerman","Leslie Price","Bessie Patterson","Ethel Wolfe","Naomi Wright","Sadie Frank","Lonnie Cannon","Tony Garcia","Darla Newton","Ginger Clark","Lionel Campbell","Florence Klein","Harriet Leonard","Terrence Harrington","Travis Garner","Doug Bass","Pat Norris","Dawn Young","Shari Alvarez","Tamara Robinson","Megan Morgan","Kara Obrien"],["0","0","1","3","3","2","1","1","0","0","0","0","1","3","3","2","1","1","0","0","1","1","0","0","4","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","3","3","2","1","1.5 - honestly I think one of my cats is half human","0","0","1","0","1","three","1","1","1","0","0","2"],["left","left","left","left","left","left","left","left","left","left","left","left","left","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","ambidextrous","ambidextrous","ambidextrous","ambidextrous","ambidextrous"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>name<\/th>\n      <th>number_of_cats<\/th>\n      <th>handedness<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
]

---

## Sometimes you might need to babysit your respondents

.midi[

```r
cat_lovers %&gt;%
  mutate(number_of_cats = case_when(
    name == "Ginger Clark" ~ 2,
    name == "Doug Bass"    ~ 3,
    TRUE                   ~ as.numeric(number_of_cats)
    )) %&gt;%
  summarise(mean_cats = mean(number_of_cats))
```

```
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced
## by coercion
```

```
## # A tibble: 1 × 1
##   mean_cats
##       &lt;dbl&gt;
## 1     0.833
```
]

---

## Always you need to respect data types


```r
cat_lovers %&gt;%
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats
      ),
    number_of_cats = as.numeric(number_of_cats)
    ) %&gt;%
  summarise(mean_cats = mean(number_of_cats))
```

```
## # A tibble: 1 × 1
##   mean_cats
##       &lt;dbl&gt;
## 1     0.833
```

---

## Now that we know what we're doing...


```r
*cat_lovers &lt;- cat_lovers %&gt;%
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats
      ),
    number_of_cats = as.numeric(number_of_cats)
    )
```

---

## Moral of the story

- If your data does not behave how you expect it to, type coercion upon reading in the data might be the reason.
- Go in and investigate your data, apply the fix, *save your data*, live happily ever after.

---

class: middle

.hand[.light-blue[now that we have a good motivation for]]  
.hand[.light-blue[learning about data types in R]]

&lt;br&gt;

.large[
.hand[.light-blue[let's learn about data types in R!]]
]

---

class: middle

# Data types

---

## Data types in R

- **logical**
- **double**
- **integer**
- **character**
- and some more, but we won't be focusing on those

---

## Logical &amp; character

.pull-left[
**logical** - boolean values `TRUE` and `FALSE`


```r
typeof(TRUE)
```

```
## [1] "logical"
```
]
.pull-right[
**character** - character strings


```r
typeof("hello")
```

```
## [1] "character"
```
]

---

## Double &amp; integer

.pull-left[
**double** - floating point numerical values (default numerical type)


```r
typeof(1.335)
```

```
## [1] "double"
```

```r
typeof(7)
```

```
## [1] "double"
```
]
.pull-right[
**integer** - integer numerical values (indicated with an `L`)


```r
typeof(7L)
```

```
## [1] "integer"
```

```r
typeof(1:3)
```

```
## [1] "integer"
```
]

---

## Concatenation

Vectors can be constructed using the `c()` function.


```r
c(1, 2, 3)
```

```
## [1] 1 2 3
```

```r
c("Hello", "World!")
```

```
## [1] "Hello"  "World!"
```

```r
c(c("hi", "hello"), c("bye", "jello"))
```

```
## [1] "hi"    "hello" "bye"   "jello"
```

---

## Converting between types

.hand[with intention...]

.pull-left[

```r
x &lt;- 1:3
x
```

```
## [1] 1 2 3
```

```r
typeof(x)
```

```
## [1] "integer"
```
]
--
.pull-right[

```r
y &lt;- as.character(x)
y
```

```
## [1] "1" "2" "3"
```

```r
typeof(y)
```

```
## [1] "character"
```
]

---

## Converting between types

.hand[with intention...]

.pull-left[

```r
x &lt;- c(TRUE, FALSE)
x
```

```
## [1]  TRUE FALSE
```

```r
typeof(x)
```

```
## [1] "logical"
```
]
--
.pull-right[

```r
y &lt;- as.numeric(x)
y
```

```
## [1] 1 0
```

```r
typeof(y)
```

```
## [1] "double"
```
]

---

## Converting between types

.hand[without intention...]

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that's not always a great thing!

.pull-left[

```r
c(1, "Hello")
```

```
## [1] "1"     "Hello"
```

```r
c(FALSE, 3L)
```

```
## [1] 0 3
```
]
--
.pull-right[

```r
c(1.2, 3L)
```

```
## [1] 1.2 3.0
```

```r
c(2L, "two")
```

```
## [1] "2"   "two"
```
]

---

## Explicit vs. implicit coercion

Let's give formal names to what we've seen so far:

--
- **Explicit coercion** is when you call a function like `as.logical()`, `as.numeric()`, `as.integer()`, `as.double()`, or `as.character()`


--
- **Implicit coercion** happens when you use a vector in a specific context that expects a certain type of vector

---

.midi[
.your-turn[
### .hand[Your turn!]

- RStudio Cloud &gt; `AE 05 - Hotels + Data types` &gt; open `type-coercion.Rmd` and knit.
- What is the type of the given vectors? First, guess. Then, try it out in R. 
If your guess was correct, great! If not, discuss why they have that type.
]
]

--

.small[
**Example:** Suppose we want to know the type of `c(1, "a")`. First, I'd look at: 

.pull-left[

```r
typeof(1)
```

```
## [1] "double"
```
]
.pull-right[

```r
typeof("a")
```

```
## [1] "character"
```
]

and make a guess based on these. Then finally I'd check:
.pull-left[

```r
typeof(c(1, "a"))
```

```
## [1] "character"
```
]
]

---

class: middle

# Special values

---

## Special values

- `NA`: Not available
- `NaN`: Not a number
- `Inf`: Positive infinity
- `-Inf`: Negative infinity

--

.pull-left[

```r
pi / 0
```

```
## [1] Inf
```

```r
0 / 0
```

```
## [1] NaN
```
]
.pull-right[

```r
1/0 - 1/0
```

```
## [1] NaN
```

```r
1/0 + 1/0
```

```
## [1] Inf
```
]

---

## `NA`s are special ❄️s


```r
x &lt;- c(1, 2, 3, 4, NA)
```


```r
mean(x)
```

```
## [1] NA
```

```r
mean(x, na.rm = TRUE)
```

```
## [1] 2.5
```

```r
summary(x)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    1.75    2.50    2.50    3.25    4.00       1
```

---

## `NA`s are logical

R uses `NA` to represent missing values in its data structures.


```r
typeof(NA)
```

```
## [1] "logical"
```

---

## Mental model for `NA`s

- Unlike `NaN`, `NA`s are genuinely unknown values
- But that doesn't mean they can't function in a logical way
- Let's think about why `NA`s are logical...

--

.question[
Why do the following give different answers?
]
.pull-left[

```r
# TRUE or NA
TRUE | NA
```

```
## [1] TRUE
```
]
.pull-right[

```r
# FALSE or NA
FALSE | NA
```

```
## [1] NA
```
]

`\(\rightarrow\)` See next slide for answers...

---

- `NA` is unknown, so it could be `TRUE` or `FALSE`

.pull-left[
.midi[
- `TRUE | NA`

```r
TRUE | TRUE  # if NA was TRUE
```

```
## [1] TRUE
```

```r
TRUE | FALSE # if NA was FALSE
```

```
## [1] TRUE
```
]
]
.pull-right[
.midi[
- `FALSE | NA`

```r
FALSE | TRUE  # if NA was TRUE
```

```
## [1] TRUE
```

```r
FALSE | FALSE # if NA was FALSE
```

```
## [1] FALSE
```
]
]

- Doesn't make sense for mathematical operations 
- Makes sense in the context of missing data
    </textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"ratio": "16:9",
"highlightLines": true,
"highlightStyle": "solarized-light",
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
  window.dispatchEvent(new Event('resize'));
});
(function(d) {
  var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
  if (!r) return;
  s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
  d.head.appendChild(s);
})(document);

(function(d) {
  var el = d.getElementsByClassName("remark-slides-area");
  if (!el) return;
  var slide, slides = slideshow.getSlides(), els = el[0].children;
  for (var i = 1; i < slides.length; i++) {
    slide = slides[i];
    if (slide.properties.continued === "true" || slide.properties.count === "false") {
      els[i - 1].className += ' has-continuation';
    }
  }
  var s = d.createElement("style");
  s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
  d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
  var deleted = false;
  slideshow.on('beforeShowSlide', function(slide) {
    if (deleted) return;
    var sheets = document.styleSheets, node;
    for (var i = 0; i < sheets.length; i++) {
      node = sheets[i].ownerNode;
      if (node.dataset["target"] !== "print-only") continue;
      node.parentNode.removeChild(node);
    }
    deleted = true;
  });
})();
// add `data-at-shortcutkeys` attribute to <body> to resolve conflicts with JAWS
// screen reader (see PR #262)
(function(d) {
  let res = {};
  d.querySelectorAll('.remark-help-content table tr').forEach(tr => {
    const t = tr.querySelector('td:nth-child(2)').innerText;
    tr.querySelectorAll('td:first-child .key').forEach(key => {
      const k = key.innerText;
      if (/^[a-z]$/.test(k)) res[k] = t;  // must be a single letter (key)
    });
  });
  d.body.setAttribute('data-at-shortcutkeys', JSON.stringify(res));
})(document);
(function() {
  "use strict"
  // Replace <script> tags in slides area to make them executable
  var scripts = document.querySelectorAll(
    '.remark-slides-area .remark-slide-container script'
  );
  if (!scripts.length) return;
  for (var i = 0; i < scripts.length; i++) {
    var s = document.createElement('script');
    var code = document.createTextNode(scripts[i].textContent);
    s.appendChild(code);
    var scriptAttrs = scripts[i].attributes;
    for (var j = 0; j < scriptAttrs.length; j++) {
      s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
    }
    scripts[i].parentElement.replaceChild(s, scripts[i]);
  }
})();
(function() {
  var links = document.getElementsByTagName('a');
  for (var i = 0; i < links.length; i++) {
    if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
      links[i].target = '_blank';
    }
  }
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
  const hlines = d.querySelectorAll('.remark-code-line-highlighted');
  const preParents = [];
  const findPreParent = function(line, p = 0) {
    if (p > 1) return null; // traverse up no further than grandparent
    const el = line.parentElement;
    return el.tagName === "PRE" ? el : findPreParent(el, ++p);
  };

  for (let line of hlines) {
    let pre = findPreParent(line);
    if (pre && !preParents.includes(pre)) preParents.push(pre);
  }
  preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>

<script>
slideshow._releaseMath = function(el) {
  var i, text, code, codes = el.getElementsByTagName('code');
  for (i = 0; i < codes.length;) {
    code = codes[i];
    if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
      text = code.textContent;
      if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
          /^\$\$(.|\s)+\$\$$/.test(text) ||
          /^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
        code.outerHTML = code.innerHTML;  // remove <code></code>
        continue;
      }
    }
    i++;
  }
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
  var script = document.createElement('script');
  script.type = 'text/javascript';
  script.src  = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
  if (location.protocol !== 'file:' && /^https?:/.test(script.src))
    script.src  = script.src.replace(/^https?:/, '');
  document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
  </body>
</html>
